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DATA PROCESSING AND DIFFERENCE COMPUTATION FOR GENERATING 

ADDRESSING INFORMATION 



FIELD OF THE INVENTION 

This invention is directed to addressing an element in a 
document written in a language such as XML (Extensible 
Markup Language) or HTML (Hypertext Markup Language) . It is 
more particularly directed to updating a designation 
expression for an element when an document is modified. 

BACKGROUND OF THE INVENTION 

Structured documents written in XML, HTML, or other 
languages used for data exchanges over networks such as the 
Internet (referred to as structured documents hereafter) may 
have meta information, such as annotations, that addresses 
particular elements in the structured documents. The 
structured documents may also have modification rules 
written in the documents in advance, under which the 
documents are modified. To add these meta information and 
modification rules to the structured documents, XPath (XML 
Path Language) is often used to address particular positions 
in the structured documents so that external documents are 
referred to. 

XPath is a language for addressing particular parts of a 
structured document. Using XPath as addressing information 
allows arbitrarily specifying those positions in the 
structured document to which annotations are added or 
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1 modifications are made. In the subsequent description, data 

2 written in XPath will also be simply referred to as an 

3 XPath. 

4 Specifically, XPath is written in the following manner. 

5 Figure 18 shows an exemplary structure of an XML document, a 

6 type of structured document. A root element is expressed as 

7 " /" in XPath. Therefore, for the XML document in Figure 18, 

8 an element a is a child element of a root and expressed as 

9 "/a." Elements b and d are expressed as "/a/b" and 

10 "/a/b/d" , respectively. An XPath expression 

11 " //p [id=" f oo" ] " , for example, selects all p elements in an 

12 XML document that have "foo" as their id attributes. 

13 As described above, XPath allows arbitrarily addressing 

14 particular elements in a structured document such as an XML 

15 or HTML/XML document. However, if the structured document 

16 subjected to designation is modified, elements or their 

17 positions in the document change. Therefore, the position 

18 designation in XPath may get out of order, and desired 

19 elements may not be properly addressed. 

20 Conventionally, to keep the desired elements properly 

21 addressed in the structured document in this case, XPath 

22 descriptions have to be modified manually. This requires 

23 significant efforts and imposes a heavy burden on a 

24 developer of a system involving this structured document. 

25 SUMMARY OF THE INVENTION 

26 Thus, an aspect of the invention is to keep a desired 

27 element properly addressed in a structured document in which 
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1 particular elements are addressed, even if the structured 

2 document is modified. 

3 Another aspect of the invention, is to provide means for 

4 automatically updating an XPath addressing a particular 

5 element in a structured document based on a modification 

6 made to the structured document if the structured document 

7 is modified. 

8 An example embodiment of the invention to achieve the above 

9 objects is implemented as a data processing method for 

10 addressing an predetermined element or sets of elements in a 

11 structured document. The data processing method comprises 

12 the steps of: when a structured document having an element 

13 addressed by predetermined addressing information is 

14 modified, inputting the structured document to analyze a 

15 modification; and updating the addressing information 

16 according to the analyzed modification made to the 

17 structured document so that the addressing information 

18 addresses a corresponding element or corresponding elements 

19 in the modified structured document. 

20 In an alternate embodiment of the invention to achieve the 

21 above aspects is also implemented as an addressing 

22 information generation system for performing such data 

23 processing. The addressing information generation system 

24 comprises: a difference computation unit for computing a 

25 difference between structured documents; and an addressing 

26 information generation unit for generating addressing 

27 information from addressing information that addresses a 

28 part of a particular structured document based on 

29 information on the difference computed by the difference 

30 computation unit, the generated addressing information 
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1 addressing a corresponding part of the other structured 

2 document . 

3 BRIEF DESCRIPTION OF THE DRAWINGS 

4 These and other aspects, features, and advantages of the 

5 present invention will become apparent upon further 

6 consideration of the following detailed description of the 

7 invention when read in conjunction with the drawing figures, 

8 in which: 

9 Fig. 1 is a schematic diagram of an exemplary hardware 

10 configuration of a computer suitable for implementing a 

11 method for updating an XPath according to an embodiment; 

12 Fig. 2 shows a configuration of a system for updating an 

13 XPath according to the embodiment implemented in the 

14 computer shown in Figure 1; 

15 Fig. 3 is a flowchart of processing performed by a 

16 difference computation algorithm suitable for the 

17 embodiment, and more particularly an InsertNode analysis; 

18 Fig. 4 is a flowchart of processing performed by the 

19 difference computation algorithm suitable for the 

20 embodiment, and more particularly a RemoveNode analysis; 

21 Fig. 5 is a flowchart of processing performed by the 

22 difference computation algorithm suitable for the 

23 embodiment, and more particularly a Modify analysis; 
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1 Fig. 6 is a flowchart of processing performed by the 

2 difference computation algorithm suitable for the 

3 embodiment, and more particularly the Modify analysis; 

4 Fig. 7 is a functional block diagram of an XPath update unit 

5 in the embodiment; 

6 Fig. 8 shows node correspondences between an unmodified 

7 document P and a modified document P' in the embodiment; 

8 Fig. 9 shows correspondences between a NodeSet S (i) and a 

9 NodeSet S (i) " in the embodiment; 

10 Fig. 10 shows an example of a node correspondence table used 

11 in the embodiment; 

12 Fig. 11 is a flowchart showing a process of generation of an 

13 XPath by an XPath generator according to the embodiment; 

14 Fig. 12 shows examples of the unmodified document P and the 

15 modified document P'; 

16 Fig. 13 shows an XPath updated according to the modified 

17 document P ' ; 

18 Fig. 14 shows another examples of the unmodified document P 

19 and the modified document P'; 

20 Fig. 15 shows an exemplary annotation system provided with 

21 an XPath update tool according to the embodiment; 

22 Fig. 16 shows an exemplary difference computation for trees; 
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1 Fig. 17 shows another exemplary difference computation for 

2 trees ; and 



3 Fig. 18 shows an example of an XML document. 
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15 DESCRIPTION OF THE INVENTION 

16 The present invention provides methods, apparatus and 

17 systems to keep a desired element properly addressed in a 

18 structured document in which particular elements are 

19 addressed, even if the structured document is modified. 

20 The invention also provides means for automatically updating 

21 an XPath addressing a particular element in a structured 

22 document based on a modification made to the structured 

23 document if the structured document is modified. 

24 An example of a method of the invention, is implemented as a 

25 data processing method for addressing predetermined element 
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1 or sets of elements in a structured document. The method 

2 includes the steps of: when a structured document having an 

3 element addressed by predetermined addressing information is 

4 modified, inputting the structured document to analyze a 

5 modification; and updating the addressing information 

6 according to the analyzed modification made to the 

7 structured document so that the addressing information 

8 addresses a corresponding element or corresponding elements 

9 in the modified structured document. 

10 Specifically, the step of analyzing a modification made to 

11 the structured document comprises: converting an unmodified 

12 version and a modified version of the structured document 

13 into tree-structured data items; and computing a difference 

14 between the tree-structured data items. The addressing 

15 information is updated based on the difference between the 

16 tree-structured data items. 

17 More specifically, the processing of computing the 

18 difference between the tree-structured data items is 

19 performed to track a component of the tree-structured data 

20 items that is moved in operations required for 

21 transformation between the tree-structured data items 

22 transformed from one to the other according to modification 

23 of the structured document. 

24 Preferably, an XPath may be used as the addressing 

25 information for addressing the element in the structured 

26 document . 

27 Then, updating the addressing information comprises updating 

28 an XPath describing the addressing information by 

29 regenerating LocationSteps forming the XPath based on the 
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1 difference between the unmodified version and the modified 

2 version of the structured document. 

3 The invention to achieve the above objects is also 

4 implemented as an addressing information generation system 

5 for performing such data processing. The addressing 

6 information generation system comprises: a difference 

7 computation unit for computing a difference between 

8 structured documents; and an addressing information 

9 generation unit for generating addressing information from 

10 addressing information that addresses a part of a particular 

11 structured document based on information on the difference 

12 computed by the difference computation unit, the generated 

13 addressing information addressing a corresponding part of 

14 the other structured document. 

15 More preferably, the addressing information generation 

16 system further comprises a document analysis unit for 

17 analyzing structures of the structured documents and 

18 converting the structures into tree-structured data items, 

19 wherein the difference computation unit computes the 

20 difference by comparing the tree-structured data items 

21 corresponding to the structured documents converted by the 

22 document analysis unit. 

23 The invention to achieve the above objects may also be 

24 implemented as a method for computing a difference between 

25 at least two tree-structured data items. The method 

26 comprises the steps of: reading at least two tree-structured 

27 data items to be processed from memory to compare the at 

28 least two tree-structured data items and creating an 

29 operation sequence, in which each operation for transforming 

30 one of the tree-structured data items into the other 
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1 tree-structured data item is expressed as a combination of 

2 predetermined operations on a component of a tree-structure; 

3 and changing operations in the operation sequence that are 

4 interpreted as a movement of a component into an operation 

5 of moving the component. 

6 The components of the tree-structures include nodes and 

7 subtrees of the trees. The combination of predetermined 

8 operations on a component of the tree-structure is a 

9 combination of basic operations such as inserting, removing, 

10 and modifying the component. 

11 More specifically, the step of changing the operation 

12 sequence in the list comprises adding an operation of moving 

13 a component of the tree-structured data items to the 

14 operation sequences in place of a pair of operations of 

15 removing and inserting the component in the operation 

16 sequences. 

17 The step further comprises replacing, based on a 

18 predetermined rule, an operation of modifying a component of 

19 the tree-structured data items in the operation sequences 

20 with a different operation that involves moving the 

21 component. 

22 The invention to achieve the above objects is also 

23 implemented as an annotation server for managing annotation 

24 data made for an HTML /XML document. The annotation server 

25 comprises: difference computation means for computing, when 

26 the HTML / XML document for which the annotation data has been 

27 made is modified, a difference between an unmodified version 

28 and a modified version of the HTML /XML document; and XPath 

29 update means for updating, based on difference information 



Dock t Ntimber JP920020093US1 



- 9 - 



1 obtained from computation by the difference computation 

2 means, an XPath associating the annotation data with a part 

3 of the HTML / XML document. 

4 The invention to achieve the above objects is also 

5 implemented as a program for controlling a computer so that 

6 the computer performs processing corresponding to the steps 

7 of the data processing method or the method for computing a 

8 difference described above, or the invention is also 

9 implemented as a program for causing a computer to function 

10 as the system for updating addressing information or the 

11 annotation server described above. The program may be 

12 stored in and distributed as a magnetic disk, optical disk, 

13 semiconductor memory, or other storage media, or distributed 

14 through a network. 

15 Now, the invention will be described in detail below based 

16 on an embodiment illustrated in the appended drawings. 

17 Figure 1 is a schematic diagram of an exemplary hardware 

18 configuration of a computer suitable for implementing a 

19 method for updating an XPath according to this embodiment. 

20 The computer shown in Figure 1 includes a CPU (Central 

21 Processing Unit) 101 as operation means; main memory 103 

22 connected to the CPU 101 via a M/B (motherboard) chipset 102 

23 and a CPU bus; a video card 104 also connected to the CPU 

24 101 via the M/B chipset 102 and an AGP (Accelerated Graphics 

25 Port); a hard disk 105, a network interface 106, and a USB 

26 port 107 connected to the M/B chipset 102 via a PCI 

27 (Peripheral Component Interconnect) bus; and a floppy disk 

28 drive 109 and a keyboard/mouse 110 connected to the M/B 
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1 chipset 102 via the PCI bus over a bridge circuit 108 and a 

2 low-speed bus such as an ISA (Industry Standard 

3 Architecture) bus. 

4 It is noted that Figure 1 is a mere illustration of a 

5 hardware configuration of a computer for realizing this 

6 embodiment, and various other configurations to which this 

7 embodiment can be applied may also be employed. For 

8 example, only video memory may be provided instead of the 

9 video card 104, in which case the CPU 101 processes image 

10 data. Further, a CD-ROM (Compact Disc Read Only Memory) 

11 driver or a DVD-ROM (Digital Versatile Disc Read Only 

12 Memory) driver may be provided via an interface such as ATA 

13 (AT Attachment) . 

14 Figure 2 shows a configuration of a system for updating an 

15 XPath according to this embodiment implemented in the 

16 computer shown in Figure 1. 

17 As shown in Figure 2, the system for updating an XPath 

18 according to this embodiment includes a document analysis 

19 unit 10 for analyzing structures of a structured document, a 

20 difference computation unit 2 0 for checking modifications 

21 made to the structured document based on an analysis result 

22 of the document analysis unit 10, and an XPath update unit 

23 3 0 for updating an XPath description, which is addressing 

24 information, based on a computation result of the difference 

25 computation unit 20. 

26 These components are virtual software blocks provided by a 

27 program that is deployed in the main memory 103 shown in 

28 Figure 1 and controls the CPU 101. The program that 

29 controls the CPU 101 to provide these functions may be 
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1 stored in and distributed as a magnetic disk, optical disk, 

2 semiconductor memory, or other storage media, or distributed 

3 through a network. In this embodiment, the program is input 

4 via the network interface 106 or the floppy disk drive 109 

5 shown in Figure 1, or a CD-ROM drive (not shown) and stored 

6 in the hard disk 105. Then, the program stored in the hard 

7 disk 105 is loaded and deployed in the main memory 103, and 

8 executed by the CPU 101 to provide the functions of the 

9 components shown in Figure 2 . 

10 The structured documents and the XPath to be processed are 

11 stored in a predetermined area, for example, an area in the 

12 hard disk 105, and read by the CPU 101 for XPath update 

13 processing according to this embodiment. 

14 In this embodiment shown in Figure 2, the document analysis 

15 unit 10 analyzes the structured documents and converts them 

16 into data in a tree-structure such as a DOM tree (the data 

17 will be simply referred to a tree hereafter) . The documents 

18 to be converted are an unmodified version and a modified 

19 version of a modified structured document. That is, given a 

20 structured document to be processed (a modified structured 

21 document) stored in memory means such as the hard disk 105, 

22 the document analysis unit 10 reads and analyzes an 

23 unmodified version (called an unmodified document hereafter) 

24 P and a modified version (called a modified document 

25 hereafter) P' of the structured document, generates a tree T 

26 (corresponding to the unmodified document P) and a tree T' 

27 (corresponding to the modified document P')/ and outputs the 

28 trees. The output trees T and T' are temporarily stored in 

29 memory means such as the main memory 103 to be used by the 

30 difference computation unit 20. A conventional technique 

31 may be used as a conversion algorithm for generating the 



Docket Number JP920020093US1 



- 12 - 



1 trees of the structured document. 

2 The difference computation unit 20 computes differences 

3 between the trees of the unmodified and modified structured 

4 documents converted by the document analysis unit 10. As a 

5 result, the details of the modifications made to the 

6 structured document to be processed are recognized. This 

7 embodiment proposes a novel method for computing the 

8 differences suitable for the XPath update to be performed 

9 later. Now, a difference computation algorithm of this 

10 method will be described below. 

11 As background knowledge, a conventional difference 

12 computation algorithm generally used will be described. 

13 While various algorithms have been proposed for computing 

14 differences between two trees, a typical difference 

15 computation algorithm is the one that computes a 

16 minimum-cost operation sequence. 

17 Figures 16 and 17 show exemplary difference computations for 

18 trees. 

19 As shown in Figure 16, given a tree structure with a parent 

20 node a and child nodes b, c, and d, a difference is computed 

21 between two trees 161 and 162 in which the positions of 

22 nodes b and d are exchanged. The processing cost of 

23 transformation between the trees is 1 for each of basic 

24 operations; RemoveNode for removing a node, InsertNode for 

25 inserting a node, and Modify for modifying the content of a 

26 node . 

27 In this case, the algorithm for computing a minimum-cost 

28 operation sequence computes to determine that the 
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1 transformation of the tree 161 into the tree 162 requires 

2 operations of modifying the content of the node b into the 

3 content of the node d (Modify (b— >d) ) and modifying the 

4 content of the node d into the content of the node b (Modify 

5 (d->b) ) . This is because these operations enable the tree 

6 161 to be transformed into the tree 162 at the minimum 

7 processing cost of 2 according to the above mentioned 

8 processing cost value. 

9 As shown in Figure 17, given a tree structure with a parent 

10 node a and child nodes b, c, and d, a difference is computed 

11 between two trees 171 and 172 in which the position of the 

12 node b relative to the nodes c and d is different. 

13 In this case, the algorithm for computing a minimum-cost 

14 operation sequence computes to determine that transformation 

15 of the tree 171 into the tree 172 requires operations of 

16 removing the node b from the tree 171 (RemoveNode (b) ) and 

17 inserting the node b into a position shown in the tree 172 

18 (InsertNode (b) ) . Again, this is because these operations 

19 enable the tree 171 to be transformed into the tree 172 at 

20 the minimum processing cost 2 according to the above 

21 mentioned processing cost value. 

22 However, this algorithm for computing a minimum-cost 

23 operation sequence is not suitable for this embodiment, 

24 because the aim of the difference computation in this 

25 embodiment is the automatic XPath update. In the example of 

26 Figure 16, if an XPath addresses the node b to the tree 161, 

27 the target of the XPath cannot be properly modified only 

28 with information that the content of the node is modified 

29 (from b into d) . Here, the transformation of the tree 161 

30 into the tree 162 can be interpreted as operations of moving 
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1 the nodes b and d such that their positions are exchanged. 

2 Based on this information, the description of the XPath 

3 addressing the node b can be properly modified. Therefore, 

4 for use as the information for modifying the XPath, the 

5 operation sequence for transforming the tree 161 into the 

6 tree 162 can be more appropriately expressed with MoveNode, 

7 an operation of moving a node, as MoveNode (b) and MoveNode 

8 (d) . However, MoveNode is a combination of the basic 

9 operations RemoveNode and InsertNode described above 

10 (operations of removing and inserting a node) , thereby 

11 increasing the total cost of transformation of the tree 161 

12 into the tree 162 to 4. Thus, the algorithm for computing a 

13 minimum-cost operation sequence cannot detect MoveNode. 

14 Similarly, in the example of Figure 17, the target of the 

15 XPath cannot be properly modified only with information on 

16 one of the operations of removing the node b and inserting 

17 the node b. Here, the transformation of the tree 171 into 

18 the tree 172 can be interpreted as an operation of moving 

19 the node.b from the left of the node c to the right of the 

20 node d. Based on this information, the description of the 

21 XPath addressing the node b can be properly modified. 

22 Therefore, the operation sequence for transforming the tree 

23 171 into the tree 172 can be more appropriately expressed as 

24 MoveNode (b) . In this example, the total cost remains as 2 

25 because MoveNode is a combination of RemoveNode and 

26 InsertNode. However, the algorithm for computing a 

27 minimum-cost operation sequence does not guarantee to detect 

28 MoveNode and thereby is not suitable for this embodiment. 

29 The above discussion also applies to MoveTree, an operation 

30 of moving a subtree (partial tree structure within a tree) . 

31 It should be understood that although the subsequent 
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1 description addresses only the processing of nodes for 

2 simplicity, MoveTree may be similarly analyzed. 

3 Based on the above discussion, a description will be given 

4 of the difference computation algorithm executed by the 

5 difference computation unit 20 and suitable for this 

6 embodiment. The difference computation algorithm used in 

7 this embodiment is designed to track objects (nodes and 

8 subtrees) that have been moved due to modification of a 

9 tree. 

10 Figures 3 to 6 are flowcharts describing processing 

11 performed by the difference computation algorithm suitable 

12 for this embodiment. 

13 The difference computation unit 2 0 receives inputs of the 

14 tree T corresponding to the unmodified document P and the 

15 tree T 1 corresponding to the modified document P' from 

16 memory means such as the main memory 103, where the trees 

17 have been temporarily stored. Then, it analyzes operations 

18 required for modifying the tree T into the tree T' in terms 

19 of the basic operations, RemoveNode, InsertNode, and Modify, 

20 and generates a list L of obtained operation sequences. The 

21 analysis may be performed using a conventional technique, 

22 such as the above described algorithm for computing a 

23 minimum-cost operation sequence. The generated list L of 

24 the operation sequences is temporarily stored in memory 

25 means such as the main memory 103. Then, the difference 

26 computation unit 2 0 analyzes the list L stored in the main 

27 memory 103 to detect MoveNode as shown in Figures 3 to 6 . 

28 In an InsertNode analysis shown in Figure 3, the difference 

29 computation unit 20 first takes a certain InsertNode 
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1 (InsertNode (n) ) in the list L as its focus (step 301). 

2 Then, it checks whether the list L has an operation 

3 RemoveNode for the same node as the target node of the focus 

4 InsertNode (in this figure, node n) (step 302) . If the 

5 corresponding RemoveNode (RemoveNode (n) ) is not in the list 

6 L, the node n is a node newly added to the tree T' . 

7 Therefore, the processing simply terminates, 

8 If a RemoveNode (n) is in the list L, it makes up a MoveNode 

9 (n) in combination with the InsertNode (n) . Therefore, a 

10 MoveNode (n) is added to the list L (step 303), and the 

11 InsertNode (n) and the RemoveNode (n) are deleted from the 

12 list L (step 304). In this manner, the difference 

13 computation unit 2 0 processes all InsertNode in the list L. 

14 In a RemoveNode analysis shown in Figure 4, the difference 

15 computation unit 2 0 first takes a certain RemoveNode 

16 (RemoveNode (n) ) in the list L as its focus (step 401). 

17 Then, it checks whether the list L has an operation 

18 InsertNode for the same node as the target node of the focus 

19 RemoveNode (in this figure, node n) (step 402) . If the 

20 corresponding InsertNode (InsertNode (n) ) is not in the list 

21 L, the node n is a node removed from the tree T' . 

22 Therefore, the processing simply terminates. 

23 If an InsertNode (n) is in the list L, it makes up a 

24 MoveNode (n) in combination with the RemoveNode (n) . 

25 Therefore, a MoveNode (n) is added to the list L (step 403), 

26 and the RemoveNode (n) and the InsertNode (n) are deleted 

27 from the list L (step 404) . In this manner, the difference 

28 computation unit 20 processes all RemoveNode in the list L. 

29 In a Modify analysis shown in Figures 5 and 6, the 
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1 difference computation unit 20 first takes an operation 

2 Modify (nl-»nx) for modifying the content of a node nl in 

3 the list L as its focus (step 501) . Then, it checks whether 

4 the list L has an operation Modify (ny-»nl) for inversely 

5 modifying the content of a node into the nl (step 502) . 

6 If a Modify (ny-»nl) is in the list L, then the difference 

7 computation unit 20 checks whether the content of the node 

8 nx is identical with the content of the node ny (that is, nx 

9 = ny) (step 503) . If nx = ny, it can be interpreted to mean 

10 that the positions of the node nl and nx (= ny) have been 

11 exchanged. Therefore, a Movenode (nl) and a Movenode (ny) 

12 are added to the list L (step 504), and the Modify (nl-»nx) 

13 and the Modify (ny->nl) are deleted from the list L (step 

14 513). 

15 If nx * ny, it can be interpreted to mean that the node nl 

16 has been moved to the original position of the node ny in 

17 the tree T, the node ny has been removed, and another node 

18 nx has been newly inserted into the original position of the 

19 node nl . Therefore, an InsertNode (nx) , a RemoveNode (ny) , 

20 and a Movenode (nl) are added to the list L (step 505) , and 

21 further the InsertNode analysis and the RemoveNode analysis 

22 shown in Figures 3 and 4 are performed (step 506) . Then, 

23 the Modify (nl-»nx) and the Modify (ny-*nl) are deleted from 

24 the list L (step 513). 

25 If a Modify (ny-»nl) is not in the list L in step 502, then 

26 the difference computation unit 20 checks whether an 

27 operation InsertNode (nl) for the node nl is in the list L 

28 (step 507 in Figure 6) . If an InsertNode (nl) is in the 

29 list L, it can be interpreted to mean that the node nl has 

30 been moved to another position, and another node nx has been 
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1 inserted into the original position of the node nl . 

2 Therefore, an InsertNode (nx) and a Movenode (nl) are added 

3 to the list L (step 508), and further the InsertNode 

4 analysis shown in Figure 3 is performed (step 509) . Then, 

5 the Modify (nl-»nx) and the InsertNode (nl) are deleted from 

6 the list L (step 513 in Figure 5) . 

7 If an InsertNode (nl) is not in the list L in step 507, then 

8 the difference computation unit 20 checks whether an 

9 operation RemoveNode (nx) for the node nl is in the list L 

10 (step 510 in Figure 6). If a RemoveNode (nx) is in the list 

11 L, it can be interpreted to mean that the node nl has been 

12 removed, and a node nx has been inserted to that position. 

13 Therefore, a RemoveNode (nl) and a MoveNode (nx) are added 

14 to the list L (step 511), and further the RemoveNode 

15 analysis shown in Figure 3 is performed (step 512). Then, 

16 the Modify (nl-»nx) and the RemoveNode (nx) are deleted from 

17 the list L (step 513 in Figure 5) . 

18 If a RemoveNode (nx) is not in the list L in step 510, it 

19 can be interpreted to mean that the content of the node nl 

20 has been simply modified into nx, and therefore the 

21 processing simply terminates. In this manner, the 

22 difference computation unit 20 processes all Modify in the 

23 list L. 

24 Thus, the differences between the trees T and T' are 

25 computed. The obtained difference data is temporarily 

26 stored in memory means, such as the main memory 103, to be 

27 used by the XPath update unit 30. As realized in these 

28 three analysis, in this embodiment, all operations for the 

29 tree T to be transformed into the tree T' that can be 

30 interpreted as node movements are detected as moving 
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1 operations Move so that they can be used in the subsequent 

2 XPath update processing. 

3 The XPath update unit 30 receives an input of the 

4 computation result of the differences between the trees T 

5 and T f obtained by the difference computation unit 20 and an 

6 input of an XPath for the unmodified document P (referred to 

7 as XPath (P) hereafter) . Based on these inputs, the XPath 

8 update unit 30 then generates and outputs an XPath for the 

9 modified document P' (referred to as XPath (P') hereafter). 

10 Figure 7 is a functional block diagram of the XPath update 

11 unit 30. 

12 Referring to Figure 7, the XPath update unit 30 for 

13 generating addressing information includes a function for 

14 interpreting an XPath (XPath interpreter 31) , a function for 

15 storing information on correspondences between nodes in the 

16 unmodified and modified structured documents (node 

17 correspondence table 32), and a function for generating an 

18 XPath (XPath generator 33). The XPath update unit 30 

19 receives inputs to be processed, that is, the unmodified 

20 document P, the modified document P', the differences D 

21 between the unmodified document P and the modified document 

22 P', and the XPath (P) from memory means such as the main 

23 memory 103 or the hard disk 105. Then, the XPath update 

24 unit 30 generates the XPath (P') with these functions. The 

25 generated XPath (P') is stored in memory means such as the 

26 hard disk 105. 

27 Now, the XPath update processing performed by the XPath 

28 update unit 30 will be described in detail below. 
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1 Figure 8 shows node correspondences between the unmodified 

2 document P and the modified document P' . 

3 The XPath (P) is formed of layers of paths (LocationStep) Ls 

4 (i) (i = 0, 1, 2, . . . , n) . In the unmodified document P, 

5 each set of nodes to be addressed by the LocationStep Ls 

6 (i), which is a NodeSet S (i) , is computed in processing 

7 performed by the XPath interpreter 31. Similarly, in the 

8 modified document P', a NodeSet S (i) ' to be addressed by 

9 the LocationStep Ls (i) is computed. 

10 On the other hand, the node correspondence table 32, which 

11 represents the node correspondences between the unmodified 

12 and modified documents P and P', is generated from the 

13 unmodified document P, the modified document P', and the 

14 differences D between the unmodified and modified documents 

15 P and P'. The generated node correspondence table 32 is 

16 stored in memory means, such as a register of the CPU 101 or 

17 the main memory 103, in the computer shown in Figure 1. An 

18 example of the node correspondence table 32 is shown in 

19 Figure 10. The node correspondence table 32 in this figure 

20 shows that, for example, a node NO in the unmodified 

21 document P corresponds to a node N'O in the modified 

22 document P', and a node N3 in the unmodified document P has 

23 no corresponding node in the modified document P' (the node 

24 N3 has been removed by the modification of the structured 

25 document) . 

26 Based on the node correspondence table 32 and the NodeSet S 

27 (i) to be addressed by the LocationStep Ls (i) in the 

28 unmodified document P, a NodeSet (i) " is obtained. 

29 Figure 9 shows correspondences between the NodeSet S (i) ' 
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1 and the NodeSet S ( i ) " . 



2 The difference between the NodeSet S (i) ' and the NodeSet S 

3 (i) " is that the NodeSet S (i) 1 is obtained simply by 

4 applying path patterns to the modified document P', whereas 

5 the NoedSet S (i) " is obtained by tracking modifications 

6 based on the difference information. It is noted that both 

7 the NodeSet S (i) 1 and the NodeSet S (i) " are sets of nodes 

8 in the modified document P' . 

9 Next, the XPath generator 33 compares the NodeSet S (i) ' and 

10 the NodeSet S (i)", and updates the LocationStep Ls (i) in 

11 the XPath (P) . The details of the update will be described 

12 later. Repeating this process for i (i = 0 to n) provides 

13 LocationStep Ls (j)' (j = 0, 1, 2, m) . This 

14 LocationStep Ls (j) ' directly represents an updated XPath 

15 (P 1 ). 

16 Figure 11 is a flowchart showing a process of generation of 

17 the XPath (LocationStep) by the XPath generator 33. 

18 Referring to Figure 11, the XPath generator 33 first 

19 compares the NodeSet S (i) ' and the NodeSet S (i)" (step 

20 1101). Then, if the NodeSet S (i) ' and the NodeSet S (i) " 

21 are equal or if the NodeSet S (i) ' is included in the 

22 NodeSet S (i)", the LocationStep Ls (i) needs no 

23 modification and is output directly as the LocationStep Ls 

24 (j) ' (step 1102, 1103) . 

25 If the NodeSet S (i) " is included in the NodeSet S (i) ' , 

26 then the XPath generator 33 generates a LocationStep from 

27 the nodes addressed by the LocationStep Ls (j-l) ' to the 

28 nodes included in the NodeSet S (i) " (step 1103, 1104). 
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1 In this manner, the Locations teps corresponding to the 

2 modified document P' are generated, and the XPath (P) is 

3 modified into the XPath (P) ' . 

4 Some types of XPath notation allow the LocationSteps 

5 generated in step 1104 to be integrated into a simple 

6 expression by generalizing them based on a predetermined 

7 generalization rule. If the LocationStep Ls (j)' cannot be 

8 generated based on a given generalization rule, the 

9 LocationSteps generated in step 1104 may be directly output 

10 while processing for an error is performed, such as 

11 displaying an alarm window or a window prompting for 

12 correction. 

13 The generation of the LocationSteps in step 1104 may be 

14 performed, for example, with a known strategy disclosed in 

15 the literature 1 below. The integration of the 

16 LocationSteps may be performed, for example, with a known 

17 strategy disclosed in the literature 2 below. 

18 Literature 1: 2001/11/08: A Visual Approach to Authoring 

19 XPath Expressions Accepted for Markup Languages: Theory and 

20 Practice, Vol. 3, No. 2. This is a paper originally 

21 published in the Proceedings Extreme Markup Languages 2001, 

22 pp. 1-15, Montreal, Canada (14-17, August 2001). 

23 http : //ares . trl . ibm. com/ freedom/doc /extml2 001 /abe0114 .html 

24 Literature 2: 2001/07/13: XSLT Stylesheet Generation by 

25 Example with WYSIWYG Editing Accepted for the presentation 

26 at International Symposium on Applications and the Internet 

27 (SAINT 2002) 

28 http://ares.trl . ibm. com/f reedom/doc/saint2002/saint2002 .html 
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1 Now, the method for updating an XPath will be described 

2 based on examples of the tree modification. 



3 Figures 12 to 14 show examples of the unmodified document P 

4 and the modified document P' . These figures show tree 

5 structures of the structured documents P and P'. Referring 

6 to Figure 12, the unmodified document P has a tree structure 

7 including a root node a and its three child nodes b. The 

8 leftmost node b has two child nodes c, in which the right 

9 node c has one child node b. On the other hand, the 

10 modified document P' has a structure in which the node b 

11 under the node c has been moved to be a child of the node a. 

12 Here, suppose that an XPath (P) " /a/b" for the unmodified 

13 document P addresses the three child nodes b of the node a. 

14 The expression "/a/b" addresses all child nodes b of the 

15 node a. Where the unmodified document P has been modified 

16 into the modified document P\ the "/a/b" would, if used as 

17 it is, address the four child nodes b of the node a in the 

18 modified document P'. However, the node b that has been 

19 moved to be a child of the node also existed in the 

20 unmodified document P and was a node that was not addressed 

21 by the "/a/b". Therefore, it should not be addressed by the 

22 "/a/b" in the modified document P' as well. 

23 In this embodiment, the XPath update unit 30 can refer to 

24 the node correspondence table 32 generated according to the 

25 differences D computed by the difference computation unit 

26 20, and know that the three nodes b addressed by the XPath 

27 (P) in the unmodified document P correspond to the first to 

28 third nodes b from left among the four nodes b in the 

29 modified document P', as shown in Figure 13. Therefore, the 
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1 XPath generator 33 generates an XPath (P') that addresses 

2 only these three nodes b (nodes b that existed in the same 

3 positions in the unmodified document P) . That is, the 

4 expression "/a/b" is modified into the expression 

5 "/a/b [position () >= 3]". 

6 Referring to Figure 14, the unmodified document P is the 

7 same as in Figure 12, whereas in the modified document P', 

8 one of the tree child nodes b of the node a has been 

9 removed. In this case, the NodeSet addressed by the 

10 expression "/a/b" in the modified document P' is included in 

11 the NodeSet addressed by the expression in the unmodified 

12 document P. That is, the expression addresses no redundant 

13 nodes. Therefore, the expression of the XPath needs no 

14 modification. In some applications according to this 

15 embodiment, a user may be notified that one of the addressed 

16 nodes b has been removed in the modified document P' . 

17 As described above, this embodiment enables detecting a 

18 difference between an unmodified version and a modified 

19 version of a modified structured document, and based on the 

20 difference, automatically updating a corresponding XPath. 

21 However, in practice, the XPath may not be updated exactly 

22 according to the intention of a developer of a system 

23 involving the structured document and the XPath. In 

24 addition, the developer may want to further modify the XPath 

25 after it is automatically updated. Therefore, this 

26 embodiment can also be implemented as an interactive XPath 

27 update tool . 

28 Figure 15 shows an exemplary annotation system provided with 

29 such an XPath update tool. In Figure 15, an annotation 

30 server 1500 has functions corresponding to the document 
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1 analysis unit 10, the difference computation unit 20, and 

2 the XPath update unit 3 0 according to this embodiment. 

3 These functions are provided as functions of a 

4 program- control led CPU in a computer embodying the 

5 annotation server 1500. A display unit of a console 1510 

6 operated by an annotation developer displays a structured 

7 document 1511 to be processed (for example, an HTML / XML 

8 document) and an interaction window 1512 for updating an 

9 XPath . 

10 When the certain structured document 1511 annotated under 

11 the control of the annotation server 1500 is modified, the 

12 annotation server 1500 causes the display unit of the 

13 console 1510 to display an unmodified version and a modified 

14 version of the structured document 1511, and the interaction 

15 window 1512. The annotation server 1500 then asks the 

16 annotation developer whether to update the XPath according 

17 to the modification made to the structured document 1511. 

18 If the annotation developer clicks on the button "Yes" on 

19 the interaction window 1512, the XPath is automatically 

20 updated by the functions corresponding to the document 

21 analysis unit 10, the difference computation unit 20, and 

22 the XPaths update unit 30 of the annotation server 1500. If 

23 the annotation developer clicks on the button "Delete", the 

24 XPath is deleted and the annotation for the structured 

25 document 1511 is cleared. For an element (node) simply 

26 removed or modified in the structured document 1511, 

27 reference to its XPath becomes impossible. Here, a message 

28 may be output for notifying the annotation developer of the 

29 removal of the annotated element and asking the developer 

30 whether to add the annotation to another element. 

31 Although the foregoing describes addressing elements in a 
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1 structured document such as XML or HTML/XML document using 

2 XPath, this embodiment may also be applied to addressing 

3 elements in a structured document by any other means. 

4 Specifically, differences between an unmodified version and 

5 a modified version of a modified structured document may be 

6 computed by a function corresponding to the difference 

7 computation unit 20 described in this embodiment, and 

8 modifications may be made as suitable for means for 

9 addressing elements in the structured document (such as 

10 addressing information) . Then, the details of element 

11 designations may be appropriately updated according to 

12 modifications made to the structured document. 

13 Thus, as described above, the invention can keep a desired 

14 element properly addressed in a structured document in which 

15 particular elements are addressed, even if the structured 

16 document is modified. The invention can also provide means 

17 for automatically updating an XPath addressing a particular 

18 element in a structured document based on a modification 

19 made to the structured document if the structured document 

20 is modified. 

21 Variations described for the present invention can be 

22 realized in any combination desirable for each particular 

23 application. Thus particular limitations, and/or embodiment 

24 enhancements described herein, which may have particular 

25 advantages to the particular application need not be used 

26 for all applications. Also, not all limitations need be 

27 implemented in methods, systems and/or apparatus including 

28 one or more concepts of the present invention. 

29 The present invention can be realized in hardware, software, 

30 or a combination of hardware and software. A visualization 
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1 tool according to the present invention can be realized in a 

2 centralized fashion in one computer system, or in a 

3 distributed fashion where different elements are spread 

4 across several interconnected computer systems. Any kind of 

5 computer system - or other apparatus adapted for carrying 

6 out the methods and/or functions described herein - is 

7 suitable. A typical combination of hardware and software 

8 could be a general purpose computer system with a computer 

9 program that, when being loaded and executed, controls the 

10 computer system such that it carries out the methods 

11 described herein. The present invention can also be 

12 embedded in a computer program product, which comprises all 

13 the features enabling the implementation of the methods 

14 described herein, and which - when loaded in a computer 

15 system - is able to carry out these methods. 

16 Computer program means or computer program in the present 

17 context include any expression, in any language, code or 

18 notation, of a set of instructions intended to cause a 

19 system having an information processing capability to 

20 perform a particular function either directly or after 

21 conversion to another language, code or notation, and/or 

22 reproduction in a different material form. 

23 Thus, the invention includes an article of manufacture which 

24 comprises a computer usable medium having computer readable 

25 program code means embodied therein for causing a function 

26 described above. The computer readable program code means 

27 in the article of manufacture comprises computer readable 

28 program code means for causing a computer to effect the 

29 steps of a method of this invention. Similarly, the present 

30 invention may be implemented as a computer program product 

31 comprising a computer usable medium having computer readable 
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1 program code means embodied therein for causing a function 

2 described above. The computer readable program code means 

3 in the computer program product comprising computer readable 

4 program code means for causing a computer to effect one or 

5 more functions of this invention. Furthermore, the present 

6 invention may be implemented as a program storage device 

7 readable by machine, tangibly embodying a program of 

8 instructions executable by the machine to perform method 

9 steps for causing one or more functions of this invention. 

10 It is noted that the foregoing has outlined some of the more 

11 pertinent objects and embodiments of the present invention. 

12 This invention may be used for many applications. Thus, 

13 although the description is made for particular arrangements 

14 and methods, the intent and concept of the invention is 

15 suitable and applicable to other arrangements and 

16 applications. It will be clear to those skilled in the art 

17 that modifications to the disclosed embodiments can be 

18 effected without departing from the spirit and scope of the 

19 invention. The described embodiments ought to be construed 

20 to be merely illustrative of some of the more prominent 

21 features and applications of the invention. Other 

22 beneficial results can be realized by applying the disclosed 

23 invention in a different manner or modifying the invention 

24 in ways known to those familiar with the art. 
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